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Abstract. This article presents a spiking neuroevolutionary system 
which implements memristors as plastic connections, i.e. whose weights 
can vary during a trial. The evolutionary design process exploits pa- 
rameter self-adaptation and variable topologies, allowing the number 
of neurons, connection weights, and inter- neural connectivity pattern 
to emerge. By comparing two phenomenological real- world memris- 
tor implementations with networks comprised of (i) linear resistors (ii) 
const ant- valued connections, we demonstrate that this approach allows 
the evolution of networks of appropriate complexity to emerge whilst 
exploiting the memristive properties of the connections to reduce learn- 
ing time. We extend this approach to allow for heterogeneous mixtures 
of memristors within the networks; our approach provides an in-depth 
analysis of network structure. Our networks are evaluated on simulated 
robotic navigation tasks; results demonstrate that memristive plasticity 
enables higher performance than constant- weighted connections in both 
static and dynamic reward scenarios, and that mixtures of memristive 
elements provide performance advantages when compared to homoge- 
neous memristive networks. 

Keywords: Memristors, Genetic Algorithms, Neurocontrollers, Heb- 
bian Theory 

1. Introduction 

The field concerned with nanoscale brainlike information processing is 
known as Neuromorphic Computation (NC) [I]. NC is a new way of comput- 
ing in hardware that blurs the distinction between processor and memory, as 
both may be distributed at any spatial position in the architecture. Neuron- 
like units, such as Complementary Metal-Oxide Semiconductor (CMOS) 
neurons (2] , are densely interconnected by numerous adaptive synapses that 
communicate via transmission of spikes. Neuromorphic architectures are 
yet to be physically realised, yet are envisioned to encorporate many attrac- 
tive characteristics, including redundancy, self-organisation, adaptation, and 
learning M]. 
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NC has recently become more viable thanks to the manufacture of the 
memristor [3] (memory resistor) at HP labs |4|. A memristor is a fundamen- 
tal passive two-terminal circuit element whose state (memristance) is both 
nonvolative and dependent upon past activity. Nonvolative memory p] is 
perfect for low-power storage, and the devices dynamic internal state facili- 
tates information processing. These properties make the memristor ideal for 
use as a nanoscale synapse in NC architectures |6|. A proposed approach to 
realise learning in NC involves harnessing Hebbian principles |7 to realise 
Spike Time Dependent Plasticity (STDP) (§], allowing connections between 
a presynaptic and postsynaptic neuron to alter efflciacy dependent on the 
spike timings of those neurons. 

It has been reasoned that, much like the brain, different areas of NC 
architectures could be responsible for different activities. To this end, we fo- 
cus on the evolution of self-organizing small-scale Spiking Neural Networks 
(SNNs [9]), where each network can be conceptualised as being representa- 
tive of part of a larger NC architecture. We employ a model of neuroevolu- 
tion whereby each network in the population initially comprises of a number 
of hidden layer neurons, connected to a problem-dependent, fixed number of 
input and output neurons. The evolutionary process can then alter network 
topology as part of the Genetic Algorithm (GA) [To] . 

In this study, we initially compare two phenomenological memristor im- 
plementations, Hewlitt-Packard (HP)-like [4] and Polyethylene Oxide - Polyani- 
line (PEO-PANI)-like [IT], and analyse their computational properties when 
cast as synaptic connections in evolutionary SNNs. The memristive element 
of the network is designed to allow the weight of the connections to vary 
during a trial, providing a learning architecture which may be beneficial to 
the evolutionary design process. We then allow for the evolution of heteroge- 
neous memristive networks (e.g. those containing all memristor types), and 
investigate whether such mixtures give an inherent performance advantage 
when compared to their homogeneous counterparts. As the equations used 
to govern the memristors are based on physical devices, the evolved net- 
works represent possible behaviours of partial NC architectures (e.g. in the 
context of evolvable hardware [12]). Performance is evaluated on simulated 
robotic navigation tasks. 

Our initial hypothesis is that memristive synapses provide the networks 
with increased performance. To test this hypothesis, we compare the ho- 
mogenous memristive networks (HP, and PEO-PANI) to networks solely 
comprised of linear resistors (e.g. [13]) and constant-weighted elements. Ex- 
tending the hypothesis to heterogeneous networks, we seek to confirm that 
varied memristive behaviours can be harnessed by the evolutionary design 
process to provide further advantages, specifically that (i) certain functional- 
ity can be more easily achieved by certain memristor types (ii) combinations 
of memristor types are beneficial to the networks. Specifically, we aim to 
answer the following research questions: 
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(1) Does the evolutionary process allow for the successful generation of 
memristive networks that outperform const ant- valued connections, 
despite the memristors nonlinearity and given the potential for com- 
plex interactions within memristive networks? 

(2) In the heterogeneous case, do mixtures of memristors provide better 
performance than other implementations? How do such networks 
generate useful behaviour? 

(3) Is there an evolutionary preference in assigning specific roles to spe- 
cific memristor types based on variations in their memristive be- 
haviours? 

1.1. Roadmap. The remainder of the article is ordered as follows: Section 
II introduces background research. Section III introduces the system. Sec- 
tion IV details the spiking implementation. Section V outlines memristor 
implementations. Section VI details the GA. Section VII details the net- 
work topology mechanisms. Section VIII gives the environment. Section IX 
details the experimental setup. Sections X, XI, and XII analyse the results 
of the experiments that were carried out and highlight the main differences 
between the memristor models. Section XIII provides a summary. 

2. Background 

2.1. Spiking Networks and Evolutionary Spiking Networks. SNNs 
present a biologically plausible phenomenological model of neural activity 
in the brain. In a SNN, neurons are linked via unidirectional, weighted 
connections that act as communication carriers. When two neurons (A 
and B) are connected, neuron A is either (a) presynaptic to neuron B (the 
connection is directed from neuron A to neuron B) or (b) postsynaptic to 
neuron B, if the connection is directed from neuron B to neuron A. 

The medium of communication is the action potential, or spike, which 
is emitted from a presynaptic neuron and received by all connected post- 
synaptic neurons. Each neuron has an internal state, known as "membrane 
potential", which is influenced by spike reception but decreases over time. 
Spikes are emitted from a given neuron after this state surpasses a certain 
level of excitation (received either from the environment or from presynaptic 
neurons). This time- dependent build-up of membrane potentials and release 
of spikes is able to produce dynamic activation patterns through time. 

The earliest equations that describe SNNs were described by Lapicque in 
1907 |14|. Two popular formal SNN implementations are the Leaky Inte- 
grate and Fire (LIF) model [9] (which is derived from [14] ) and the Spike 
Response Model (SRM) [9]. The main justifications for including SNN net- 
works are (i) increased utility when compared to other network models e.g. 
the MLP [15] - shown in [l6p7| (ii) current NC research focussing on spiking 
neurons as a basis of communication due to low power requirements and the 
ability to harness spikes as a learning mechanism (e.g [l8|[T9]) . 
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Application of evolutionary techniques to neural networks involves the use 
of a GA to alter connection weights, network topology, connectivity patterns, 
or combinations of the above. A survey of various methods for evolving both 
weights and architectures in neural networks is presented in |20|. Neuro- 
evolution was first applied to LIF SNNs to evolve networks that produce 
temporally-dependent outputs |21| and SRM spiking networks were first 
evolved for a vision-based navigation task |22 1. 

As the subject of the paper describes robotics tasks, a short overview of 
spiking neuroevolutionary robotics follows. Nolfi and Floreano [23] provide a 
review. SNN circuits were evolved to model abstractions of biological retina 
in [24] where the system was applied to a robotics platform. An LIF spiking 
model is used in [25], again for the goal of evolving navigation behaviours. 



A similar spiking model is applied to a simple robotic navigation task 26 
the authors conclude that the dynamics of a SNN provide further degrees 
of problem-solving freedom given temporally-sensitive problems. A recent 
hardware implementation is given in [27]. 



2.2. Memristors. Memristors (memory-resistors) are the fourth fundamen- 
tal circuit element, joining the capacitor, inductor and resistor |28 . A mem- 
ristor can be defined as a resistor whose instantaneous resistance value (a) 
depends on all charge that has passed through it (b) is nonvolatile. For- 
mally, a memristor is a passive two-terminal electronic device that is de- 
scribed by the non-linear relationship between the device terminal voltage, 
v, and terminal current, z, as shown in (1). Nonlinearity arises because the 
instantaneous memristance, M, depends on the charge q (2), where tp is the 
time integral of voltage, or magnetic flux. 



v = M(q)i (1) 
M(q) = d<p(q)/dq (2) 

The memristor was theorectically characterized and named by Chua in 
1971 [3]. Memristive systems have recently enjoyed a resurgence of interest 
from the research community after being manufactured by HP labs [i] . This 
has spawned a number of research avenues in terms of applications of mem- 
ristive systems [29] [30] , and synthesis of various other memristors [3l] [32] . 

There are many reasons to think that memristors might be useful in NC. 
Primarily, memristors can be manufactured at the required scale and im- 
plement synaptic behaviour in hardware [33]. HP memristors |4j| have been 
used in the manufacture of nanoscale neural crossbars [34] , which have been 
applied to pattern recognition circuits [30]. Silver Silicide memristors have 
been shown to function in neural architectures [32] . Memristor theory has 
also been used to model learning in amoeba [35]. In particular, [30] highlights 
the attractive prospect of applying evolutionary computation techniques di- 
rectly to memristive hardware, as memristors can simultaneously perform 
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the functions of both processor and memory. This work will focus on the 
use of the memristor as an adaptive synapse. 

2.3. Synaptic Plasticity. Hebbian learning [7] is thought to account for 
synaptic adaptation and learning in the brain. Briefly, Hebbian learning 
states that "neurons that fire together, wire together" - in other words in 
the event that a presynaptic neuron causes a postsynaptic neuron to fire, 
the synaptic strength between those two neurons is increased so that such 
an event is more likely to happen in the future. This mechanism allows for 
self-organising, correlated activation patterns and is therefore of particular 
relevence when considering learning in neural systems. 

Spike Time dependent Plasticity (STDP) |8 was originally formulated 
as a way of implementing Hebbian learning within computer-based neural 
networks. Interestingly, the STDP equation has been found to have distinct 
similararities to the reality of Hebbian learning in biological synapses 1 36 1 . 
It has recently been postulated that a memristance-like mechanism affects 
synaptic efflciacy in biological neural networks [33] , based on similarities 
between memristive equations and their neural counterparts. 

Integration of neuroevolution with neuromodulatory networks is investi- 
gated by Soltoggio (e.g., [37]). In the networks, dedicated modulatory neu- 
rons are responsible for affecting the inputs received by traditional neurons. 
Heterogeneous modulation rules are available to the networks, although 
unlike memristors they have no direct hardware analogue. The networks 
are tested on agent navigation tasks and robot controllers [38], b oth with 
promising results. A probabilistic SNN model is investigated [39] whereby 
the probability of spike transmission across a synapse is affected by Hebbian 
learning rules; results demonstrate the power of plasticity in generating var- 
ied behaviour. Floreano and Urzelai [40] evolve Discrete Time Recurrent 
Neural Networks El] where synapses are affected by four versions of the 
Hebb rule during the lifetime of the agent as it solves a navigation task. 



Memristive STDP has been implemented in |42] [32] [43] [33], with all 
four papers using spike-coincidence based STDP as a learning mechanism. 
Also consistent between the papers is the use of a "two-part spike" , which a 
SNN neuron uses to pass information to both presynaptic and postsynaptic 
neurons. The temporal coincidence between presynaptic and postsynaptic 
spikes at a memristive synapse alters the voltage across that synapse; if a 
threshold voltage is surpassed, the synapses weight is altered. The main 



difference between [42] [32] and [43] [33] is that in [42] [32], the two-part 
spike is implemented as a discrete-time stepwise waveform approximation, 
whereas [43] [33] use values calculated from continuous waveform equations, 
allowing them to operate in continuous time. 

In summary, this literature review has highlighted the previous success of 
STDP as a neural learning mechanism, and shown memristors as an ideal 
medium for the implementation of STDP, especially coupled with a SNNs. 
The relevence of a robotic navigation task in the context of neuroevolution 
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is also shown. 



3. The System 

The system presented here consists of a population of SNNs which are 
evaluated on a robotics test problem, and altered via GA operation which is 
detailed in section VI. To introduce the terminology to be used throughout 
this paper: each experiment lasted for 1000 evolutionary generations; each 
generation involved new networks in the population being evaluated on the 
test problem (a trial). Each trial consisted of a number of timesteps, which 
began with the reading of sensory information and calculation of action, and 
ended with the agent performing that action. Every timestep consisted of 
21 steps of SNN processing, at the end of which the action was calculated. 



4. Spiking Network Implementation 

We base our spiking implementation on the LIF model. Neurons can be 
stimulated either by an external current or by connections from presynaptic 
neurons; recurrency and direct input - output connections are illegal. Each 
neuron has a membrane potential, y, where y>0, which slowly degrades over 
time according to (3). As spikes are received by the neuron, the value of 
y is increased in the case of an excitory spike, or decreased if the spike is 
inhibitory. If y surpasses a positive threshold, y thresh, the neuron spikes and 
transmits an action potential to every neuron to which it is presynaptic, 
with strength relative to the efflciacy of the synapse that connects them. 
The neuron then resets its membrane potential to some low number. At 
time £, the membrane potential of a neuron is given in (3); the reset equa- 
tion is given in (4). 



y(t + l)=y(t) + (I + a-by(t)) (3) 

if(y > ythresh)y = c (4) 

Here, y(t) is the membrane potential at time t, / is the input current to 
the neuron, a is a positive constant, b is the degradation (leak) constant 
and c is the reset membrane potential of the neuron. The networks are 
arranged into three layers: input (which receives sensory information), hid- 
den (a variable-size layer), and output (where motor-actions are calculated). 
Example architectures can be seen in Figs. [10(d), [FT] and [12(d). A model 



of temporal delays is used so that, in the single hidden layer only, a spike 
sent between two neurons is received x steps after it is sent (see (5)), i s is 
the index of the sending neuron and i r is the index of the receiving neuron 
(indexing is sequential). 
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i s — i r (5) 



4.1. Action Calculation. Action calculation involves the current input 
state being repeatedly processed 21 times by each network (experimentally 
determined to allow sufficient STDP to occur during the lifetime of the 
agent). For the purposes of this paper, each network was initialised with 
6 input neurons (used to pass sensor values to the network), nine hidden 
neurons, and 2 output neurons that are used to calculate the action. Each 
output neuron had an activation window that recorded the number of spikes 
produced by that neuron over the last 21 steps. We classified the spike trains 
at the two output neurons as being either low or high activated (see (6)). 



if(n s < t s /2){low} else{high} (6) 

Here, n s is the number of spikes in the window and t s is the window size. 
The combined spike trains at the two output neurons translated to a discrete 



movement according to the output activation strengths. See section [871] for 
precise details of sensory state generation and possible actions. 



5. Variable Connections 

We are primarily interested in implementing memristors as a form of 
variably-weighted connection between neurons in our SNNs, where variable 
weight indicates that connection efficiacy can alter during a trial. The be- 
haviour of each memristor under STDP depends on the memristive equa- 
tions used; these are defined in subsections 1 and 2. The linear resistor is 
described in subsection 3. It is important to note that as the calculated resis- 
tance value of memristive connections are based on their real- world counter- 
parts, simulation results should be replicable in hardware. Constant-valued 
connections do not alter weight during a trial; rather, their weights are al- 
tered between trials via the GA. The HP memristor was chosen for study 
as it is well understood. The PEO-PANI memristor was chosen as it is 
also well-understood, but more importantly has a strongly different memris- 
tance profile (see Fig. [TV a)), allowing the potential for contrasting behaviour. 



5.0.1. HP Memristor. The HP memristor is comprised of thin- film Tita- 
nium Dioxide (TiC>2) and oxygen-depleted Titanium Oxide (TiC^-x)- The 
boundary between the two compounds moves in response to the charge on 
the memristor, which in turn alters the resistance of the device. More details 
can be found in |44|. Memristance is defined in (7): 



M R off ~ R off x RonX P xq 



(7) 
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R Q ff is the resistance of the Ti02 and R on is the resistance of the Ti02- X - 
(3 is a parameter comprising both the thickness of the device and the mobil- 
ity of the oxygen vacancies in Ti02 and T1O2-X, and q (0 < q < q m ax) (see 
(8)) is the charge on the device. 

Qmax = (Ron ~ Roff)/ ~ Ron X Roff x P (8) 

One further parameter is mem -lifetime, which is the amount of time 
it takes the memristor to alter from M = R ff to M — R on . Once M is 
calculated, the weight W of the memristive connection can be set as 1/M; 
weight is therefore equal to the inverse memristance of the device. 



5.0.2. PEO-PANI Memristor. The PEO-PANI memristor consists of layers 
of PANI, onto which Lithium ion (Li + )-doped PEO is added [II]. We have 
phenomenologically recreated the memristance profile of the PEO-PANI 
memristor, resulting in behaviour similar to that seen in [II]. The equation 
for M is identical to that of the HP memristor (7); the weighting equation 
is given in (9): 

W = l-(l/R off + R on -M) + (l/R off ) (9) 



5.0.3. Linear Resistor. In this study the term "linear resistor" refers to a 
theoretical nonvolatile-memory-augmented device that describes a linear re- 
lation between i and v. The linear resistor alters W by 1/ mem lifetime, 
therefore it takes memJifetime positive STDP events to increase W from 
Roff to R on . 



5.1. STDP. As mentioned in section pO} a number of STDP implemen- 
tations exist. As our SNNs operated in discrete time, we follow [42] [32] 
in using discrete-time stepwise waveforms. STDP could affect any variable 
connection in the networks. 

In our implementation, each neuron was augmented with a variable to 
record the last time it spiked (Is), which is initially 0. When a neuron 
spiked, Is was set to some positive number. At the end of each of the 21 
steps that make up a single timestep, each memristive connection is anal- 
ysed by checking the Is values of its presynaptic and postsynaptic neurons. 
If the calculated value exceeded a positive threshold 9 S , memristance of the 
synapse occurred (see Fig. [2|(10),(11)). At the end of each step, each Is 
value was decreased by 1 to a minimum of 0, creating a discrete stepwise 
waveform through time. Each STDP event altered q, by Aq, as detailed in 
(12), which was then used to calculate the synaptic weight. The memory of 
the system is therefore contained in q. 
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Figure 1. (a) Full range of memristance for the 
three variable connection types, given 1000 positive STDP 
events followed by 1000 negative STDP events, with 
memJifetime=1000 (b) sensitive ranges of memristance 
curves. The PEO-PANI-like and linear resistor sensitive 
ranges are described by STDP steps — 90 in Fig. 1(a); 
the HP-like sensitive range includes STDP steps 910 — 1000. 



if {Is pre H~ Is post 

> 9 S AND ls pre > ls post ){q = Aq} (10) 

ifilspre + Ispost > #s AND ls pre < ls post ){q = Aq} (11) 

Aq — q max / mem -lifetime (12) 

From Fig. 1(a) it can be seen that the amount of change in connection 
weight depends heavily on the current weight of the connection. In particu- 
lar, the HP memristor is insensitive to the effects of STDP where W < 0.1, 
the PEO-PANI-like memristor is insensitive where W > 0.9. The linear 
memristor displays constant sensitivity. Fig. 1(b) shows the effect of STDP 
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Figure 2. Visualising the HP memristor when implemented 
as a synapse, showing a positive STDP event. The numer- 
ical scale (right) shows the sum of last spike events, where 
the presynaptic node has lastspike—2 and the postsynaptic 
node has lastspike=3. Given spike -threshold — 4, memris- 
tance moves the T1O2 - Ti02- X boundary so there is more 
T1O2 than T1O2-X] resistance decreases leading to an in- 
crease in synaptic efflciacy. 

on the weight in the more sensitive areas (for HP where W > 0.1, for PEO- 
PANI where W < 0.9), and compares them to the effect of STDP on the 
linear resistor over the same number of STDP steps. In the case of the HP- 
like and PEO-PANI-like memristors, it can be seen that memristance can 
account for a change equal to 90% of the total range of W within 90 STDP 
steps; the same number of STDP steps gives a change in the linear resistor 
equal to 10% of the total range of W. 



In our GA, two parents are selected fitness-proportionately, mutated, and 
used to create two offspring. We use only mutation to explore connection 
weight space; crossover is omitted as sufficient solution space exploration 
can be obtained via a combination of self-adaptive weight and topology 
mutations; a view that is reinforced in the literature, e.g. 145] . The offspring 
are inserted into the population and two networks with the lowest fitness 
deleted. Parents stay in the population competing with their offspring. 

6.1. Self-adaptive Mutation. We utilise self-adaptive mutation rates as 
in Evolution Strategies (ES) H6], to dynamically control the frequency and 
magnitude of mutation events taking place in each network. This allows for 
increased structural stability in highly fit networks whilst allowing less fit 
networks to search solution space more widely per GA application. Here, 
the 11 (0 < 11 < 1) value (originally a, the rate of mutation per allele) of 
each network is initialized randomly uniformly in the range [0,0.25]. Dur- 
ing a GA cycle, a parents \± value is modified as in (13), the offspring then 



6. Discovery Component 
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adopts this new and mutates itself by this value, before being inserted 
into the population. The proportionality constant it set to 1 and thereore 
omitted. 

fi^fiexp N ^ (13) 

Only non-memristive networks can alter their connection weights via the 
GA. Connection weights in this case are initially set during network creation, 
node addition, and connection addition randomly uniformly in the range 
[0,1]. Memristive network connections are always set to 0.5, and cannot be 
mutated from this value. This forces the memristive networks to harness 
the plasticity of their connections during a trial to successfully solve the 
problem. 

7. Topology Mechanisms 

In addition to self-adaptive mutation, we apply two topology alteration 
schemes to allow the modification of the spiking networks by adding/removing 
(i) hidden layer nodes (ii) neural connections. This framework allows each 
network to control its own knowledge representation autonomously by adapt- 
ing network topology to reflect the complexity of the problem considered. All 
network types use these topology mechanisms. Our self-adaptive topology 
mechanisms bare some resemblance to Takagi-Sugeno (TS) (neuro-) fuzzy 
models [47[|49] in that both parameter and self-organized structure learn- 
ing occur (usually using a recursive least-squares algorithm for parameters 
and some rule density or utility metric for structure). However TS systems 
are commonly used for clustering and use multiple fuzzy rules to define a 
solution, rather than a single individual as in our case. 

7.1. Neuroevolution. Given the nature of NC, it would be useful if ap- 
propriate network structure is allowed to develop until some task- dependent 
required level of computing power is attained. A number of encoding vari- 
ants have been developed specifically for neuroevolution, including Analog 
Genetic Encoding (AGE) [50], which allows for both neurons and connec- 
tions to be modified, amongst others, e.g. |51 1. A popular framework is 
NeuroEvolution of Augmenting Topologies (NEAT) [52], which combines 
neurons from a predetermined number of niches to encourage diverse neural 
utility and enforce niche-based evolutionary pressure. This method has been 
shown to be amenable to real-time evolution [53]. Successful applications of 
neuroevolution range from real- world optimisation [54] and classification [55] 
to control [56|[57] . 

In our system, each network has a varying number of hidden layer neu- 
rons (initially 9, and always > 0). Additional neurons can be added or 
removed from the single hidden layer based on two new self-adaptive pa- 
rameters, (0 < ij) < 1) and uj (0 < uj < 1). Here, ij) is the probability 
of performing neuron addition/removal and uj is the probability of adding 
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a neuron; removal occurs with probability 1 — uo. Both have initial values 
taken from a random uniform distribution, with ranges [0,0.5] for ifj and [0,1] 
for cj. Offspring networks have their parents and uj values modified using 
(13) as with /i, with neuron addition/removal taking part after mutation. 
Added nodes are initially excitatory with 50% probability, otherwise they 
are inhibitory. 

7.2. Connection Selection. Feature selection is a way of streamlining 
input to a given process. Automatic feature selection includes wrapper 
approaches (where feature subsets change during the running of the al- 
gorithm 1 58 1) and filter approaches (where the subset selection is a pre- 



processing step [59]). The connectivity pattern of artificial neural networks 



was first evolved by Dolan and Dyer |60|. A comparitive study can be found 



In this paper we allow any connection to be individually enabled/ disabled. 
During a GA cycle a connection can be enabled or disabled based on a new 
self-adaptive parameter r (which is initialized and self-adapted in the same 
manner as \i and If a connection is enabled for a non-memristive network, 
its connection weight is randomly initialised uniformly in the range [0,1], 
memristive connections are always set to 0.5. All connections are initially 
enabled for new networks. During a node addition event, new connections 
are set probabilistically, with P [connection -enabled) — 0.5. Connection 
Selection is particularly important to the memristive networks. As they 
cannot alter connection weights via the GA, variance induced in network 
connectivity patterns plays a large role in the generation of useful STDP 
patterns. In the context of NC, an evolutionary algorithm could conceivably 
tinker with connection structure as a means of homeostatic fault tolerance 
and recovery, as well as a compression technique to reduce the number of 
active synapses. 



Our chosen robotics simulator was Webots [61] , a platform that is popu- 
lar amongst the research community. Alternatives are summarised in [62] . 
Webots was selected due to the accuracy of its simulations and prevalence 
of successful applications in the literature. Examples include evolution of 
simulated Khepera controllers to avoid obstacles [63] , showing the suitability 
of Webots to an evolutionary approach. Tellez and Angulo [64] apply in- 
cremental neuroevolution to successfully generate complex behaviours from 
intitially-less-complex environments or sensory configurations. Hierarchical 
neural control is exploited to guide a simulated Khepera around a T-maze 
using self-organising neural networks similar to our own [65] . 

8.1. The Agent. The agent was a simulated Khepera II robot with 8 light 
sensors and 8 IR distance sensors (see Fig. 3(a)). At each timestep (32ms in 
real time), the agent sampled its light sensors, whose values ranged from 8 




in (20] 
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(fully illuminated) to 500 (no light) and IR distance sensors, whose response 
values ranged from (no object detected) to 1023 (object very close). All 
sensor readings were scaled to the range [0,1] for computational reasons 
(0 being unactivated, 1 being highly activated). Six sensors were used to 
comprise the input state for the SNN, three IR and three light sensors at 
positions 0, 2 and 5 as shown in Fig. 3(a). Additionally, two bump sensors 
were added to the front-left and front-right of the agent to prevent it from 
becoming stuck against an object. If either bump sensor was activated, an 
interrupt was sent causing the agent to reverse 10cm and the agent to be 
penalised by 10 timesteps. Movement values and sensory update delays were 
constrained by Webots Khepera data. Three actions were possible: forward, 
(maximum movement on both left and right wheels) and continuous turns 
to both the left and right (caused by halving the left/right motor outputs 
respectively). Actions were calculated once at the end of each timestep from 
the output neuron classifications: (high, high) or (low, low) — forwards, 
(high, low) — left turn, (low, high) = right turn. 



8.2. The Environment. The agent was located within a walled arena 
which it could not leave, with coordinates ranging from [-1,1] in both x 
and y directions and walls around the boundary having height z = 0.15. 
Adding to the complexity of the environment, a three-dimensional box was 
placed centrally in the arena, with vertices on "ground level" at (x = —0.4, 
y = -0.4), (-0.4, 0.4), (0.4, 0.4), and (0.4, -0.4), and raised to a height 
of z = 0.15. A light source, modelled on a 15 Watt bulb, was placed at 
the top-right hand corner of the arena (x = 1, y = 1). The agent ini- 
tially faces North, and its initial start position was constrained to the range 
x + y < —1.5. The agent must traverse the environment and approach the 
light source to receive reward. The environment is shown in Fig. 3(b). 

When the agent reached the goal state (where x + y > 1.6), the respon- 
sible network received a constant fitness bonus of 2500, which was added 
to the fitness function / outlined in (14). The denominator in the equation 
expresses the difference between the position of the goal state (1.6) and the 
current agent position (posx and posy), and st is the number of timesteps 
taken to solve. The minimum value of this function is capped so that / > 0. 
The fitness of an agent is calculated at the end of every timestep, with the 
highest attained value of / during the trial kept as the fitness value for that 
network. Optimal performance gives / = 11800, which corresponds to 700 
timesteps from start to goal state with no collisions. 



/ = (1/(1.6 - (\posx + posy\))) x 1000 - st 



(14) 
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Figure 3. (a)Khepera sensory arrangement. 3 light sensors 
and 3 IR sensors share positions 0, 2 and 5. Two bump sen- 
sors, Bl and B2, are shown attached at 45 degree angles to 
the front-left and front-right of the robot. (b)The test envi- 
ronment. The agent begins in the lower-left and must reach 
a light source (circle) in the upper-right, circumnavigating 
the central obstacle. An example agent path is shown (dot- 
ted line). In the dynamic reward scenario (Section 13), the 
reward is moved to the upper-left. 



9. Experimental Setup 

In the following experiments we gauged the impact of both types of mem- 
ristive synapse, comparing to a benchmark systems containing (i) memory- 
augmented linear resistors (ii) constant-valued connections. To aid clarity, 
we adopt a shorthand of "PEO" for networks containing only PEO-PANI 
connections. Likewise, "HP", "LIN" and "GA" networks refer to networks 
containing only HP memristors, linear resistors, and constant connections 
respectively^] 

An experiment began with the generation of 100 networks of a given 
type (HP/PEO/LIN/GA). Every network in the population was then trialed 
on the test problem, with a maximum of 4000 timesteps per trial (long 
enough to allow for initial exploration). After this, 1000 generations of 
GA application took place and newly-generated networks were trialed on 
the test problem. Every 20 generations, the current state of the system was 
observed and used to create the results that follow. The entire process can be 
described as a system, with one system per connection type. All experiments 
were repeated 30 times per system. In any hardware implementation, the 
final solution would be the single fittest network from the population. 



In traditional neural network terminology, the objective of the networks is to find a 
suitable sensor-motor mapping to allow for navigation, the function is the function that 
approximates this mapping, and the compactness is the minimal network topology as 
shown in the results sections. 
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Table 1. Detailing t-test results (p values) for HP, PEO, 
LIN and GA systems on the test problem. 





Performance 


High fitness 


Neurons 


Connectivity 


HP vs PEO 


0.009 


0.033 


0.318 


0.983 


HP vs LIN 


0.009 


0.106 


0.601 


0.349 


HP vs GA 


0.023 


0.091 


0.699 


0.859 


PEO vs LIN 


0.763 


0.009 


0.130 


0.171 


PEO vs GA 


0.027 


0.044 


0.107 


0.781 


LIN vs GA 


0.019 


0.684 


0.762 


0.289 



As the robot's start location was tightly constrained, we were able to 
compare system performance, defined as the first generation in which any 
network in that system found the goal state. This measure produced 30 
numbers, one per experimental repeat, that allowed us to perform t-tests 
to compare the respective goal-seeking performance of the four systems. In 
Table [TJ "Performance" was the average performance per network type as 
outlined above. "High fitness" refers to the mean fitness of the highest- 
fitness network in each run. "Neurons" were the average final connected 
neurons per network in the population and "Connectivity" was the average 
percentage of enabled connections in the population. Statistical signifance 
was assessed on the 5% scale. 

SNN parameters were initial hidden layer nodes— 9, a = 0.3, b = 0.05, 
c = 0.0, cJni = 0.5, ythresh — 1-0 and output window size = 21. Memristive 
parameters were R on — 0.01, R ff — 1 5 — 100, mem lifetime — 1000, 
lastspike = 3, spike -threshold — 4. During a trial the variable connections 
in the networks may alter their weights via STDP. After every trial, variable 
connections were reset to their original weight of 0.5. 

9.1. Performance. The most striking result from Table [I] was that the 
PEO networks exceeded the GA networks statistically significantly, both in 
terms of performance (p=0.027) and high fitness (p=0.044). LIN networks 
also outperformed GA networks (p=0.019), although did not have statisti- 
cally better final fitness. These results indicate that these networks learn to 
harness the plasticity of their connections alongside the topology variations 
introduced by connection selection to swiftly evolve goal-finding networks. 

In contrast, HP networks display significantly lower performance than all 
other network types (p=0.009 vs. PEO, p=0.009 vs. LIN, p=0.023 vs GA), 
as well as significantly worse high fitness than PEO networks (p=0.033). 
This observed behaviour may be due to the memristance profile (see Fig.[l](b)) 
being highly sensitive to the effects of STDP for high (> 0.9) values of W, 
as well as being more likely to be stuck at low (< 0.1) W values, a notion 
echoed in [66]. It is reasoned that this combination of effects makes the 
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Figure 4. (a) Average fitness of highest-fitness network per 
run (b) average fitness of entire population per run (c) av- 
erage connected hidden layer nodes (d) average enabled con- 
nections for the first experiment 



memristor less suited to attaining highly- activated networks (network anal- 
ysis reveals lower numbers of spikes per network, possibly preventing the 
network from reliably achieving certain output patterns). 

Figs. Qa) and Qb) show that the PEO and LIN networks share similar 
fitness profiles, both being distinctly quicker to attain high fitness values 
than HP and GA networks. Both PEO and LIN networks solve the envi- 
ronment within 60 trials and attain their maximal fitness values within 150 
trials. GA networks reach lower final fitness values in a more gradual man- 
ner, reaching the maximal fitness value after 500 trials. HP networks are 
slower still; highest fitness values are attained after 950 trials. All systems 
eventually solved the problem, except 2 runs of HP networks. A summary 
of averages and standard deviations is given in Table [2] 

9.2. Topology. Although there were variations between the network types 
with regards to the numbers of hidden layer neurons, no statistically sig- 
nificant differences were observed (Table [I] shows p-values ranging from 
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Generations 



(A) 



Generations 



(B) 




Figure 5. (a) Average self-adaptive parameters (a) fi (b) ij) 
(c) cj (d) r in the population for the first experiment 



Table 2. Detailing averages and standard deviations for HP, 
PEO, LIN and GA systems on the test problem. 





HP 


PEO-PAN1 


LIN 


GA 


Perf 


526.1 (992.4) 


17 (34.6) 


14.7 (32.5) 


77.6 (130.0) 


High fit 


10660 (2280) 


11581 (303) 


11363 (398) 


11402 (277) 


Avg fit 


9477 (3333) 


11454 (319) 


11058 (728) 


11420 (423) 


Conns(%) 


49.42 (9.61) 


49.02 (4.63) 


51.26 (4.06) 


51.19 (4.71) 


Nodes 


16.68 (1.74) 


17.04 (0.09) 


16.89 (0.54) 


17.11 (0.57) 


n 


0.121 (0.09) 


0.123 (0.1) 


0.115 (0.1) 


0.018 (0.01) 




0.073 (0.04) 


0.062 (0.02) 


0.019 (0.02) 


0.056 (0.03) 


CO 


0.122 (0.11) 


0.113 (0.07) 


0.135 (0.11) 


0.122 (0.11) 


T 


0.022 (0.034) 


0.010 (0.01) 


0.010 (0.01) 


0.011 (0.01) 



p=0.107 to p=0.762). While PEO and LIN networks show smooth pro- 
files to their final average neuron numbers of 17.049 and 16.894 respectively 
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Table 3. Detailing self- adaptation t-test results (p values) 
for all systems in the first experiment 





M 


i, 


CO 


r 


HP vs PEO 


0.916 


0.211 


0.616 


0.069 


HP vs LIN 


0.549 


0.017 


0.226 


0.07 


HP vs GA 


<0.001 


0.064 


0.988 


0.09 


PEO vs LIN 


0.618 


0.178 


0.243 


0.525 


PEO vs GA 


<0.001 


0.434 


0.471 


0.129 


LIN vs GA 


<0.001 


0.471 


0.465 


0.458 



(Fig. |4j (c)) 5 HP and GA networks show more unstable profiles to final num- 
bers of 16.677 and 17.105 neurons respectively. No significant differences 
were found with regards to connectivity, although Fig. [4] (d) shows a general 
order of PEO/GA networks being more densely connected, followed by HP 
networks and finally PEO networks. Again, profiles are more stable for PEO 
and LIN networks than they are for GA and HP networks. 

9.3. Self-adaptive Parameters. In all cases, a lower parameter value is 
associated with a more stable evolutionary process, as such events are evo- 
lutionarily preferred to be less frequent within those networks. 

9.3.1. Mutation. Being the only network to utilise the /i parameter, the GA 
networks final \i values were expectedly significantly different when com- 
pared to the other network types (all p- values <0.001) - Table [3j Between 
the variable networks there were no statistically significant differences. The 
GA mutation profile (Fig. [5] (a)) can be seen to rapidly increase from a value 
of 0.3 to 0.05 at generation 300, briefly climb to approximately 0.07 at gen- 
eration 480, then descend smoothly to a final value of 0.019. Other network 
fi profiles are irrelevent to the performance of those systems. 

9.3.2. Neuron addition/removal The probability of performing a neuron ad- 
dition/removal event is encapsulated in the i/j parameter. One statistically 
significant difference can be seen in Table [3j showing that such events are 
more likely to occur in HP networks than LIN networks. This can be seen in 
Fig. [4] (c) to drive the HP networks to lower numbers of neurons per network. 
The probability of performing an addition rather than removal is governed 
by the uj parameter; Table [3] shows no statistically significant differences be- 
tween the network types. These results indicate that no single network type 
allows the evolutionary process to self-adapt to produce networks containing 
statistically fewer neurons whilst maintaining high performance. 
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9.3.3. Connection addition/removal. Connection selection is associated with 
the r parameter. Table [3] shows no statistically significant differences, al- 
though values comparing HP to other network types are all almost significant 
(p=0.069 vs. PEO, p=0.07 vs. LIN p=0.09 vs GA). This difference is re- 
flected in Fig. [5](d), where HP networks produce the only initially upwards- 
trending profile which follows a markedly different curvature to the others. 
Again, PEO and LIN network profiles can be observed to be similar to each 
other. 



10. Heterogeneous Mixtures of Memristors 

While the experiments presented in Section [9] show the benefits of mem- 
ristive connections, each networks behaviour under STDP is limited as ho- 
mogeneous memristors follow identical STDP response curves as shown in 
Fig. [ija). To increase the variety of plastic behaviours available to the 
network as a whole, we now extend the system to allow networks to be 
comprised of all three variable connection types (HP memristor, PEO-PANI 
memristor, and linear resistor). As with our previous experiments, these 
networks should be replicable in hardware, provided that the myriad mem- 
ristors can interface with a single neuron type, operate on the same scale, 
and possess similar electrical tolerances. 

Mixing different types of synaptic plasticity has been investigated previ- 
ously |37| |39J (40J. In the first paper, interneural connections are affected by 
six distinct variations of the traditional Hebb rule. In |39[, spike transmis- 
sion from synapse to neuron is probabilistic, with heterogeneous probabili- 
ties throughout the network. Finally, (40l uses four unique Hebbian learning 
rules for its connections; networks may be comprised of all four connection 
types. All three papers consistently report that networks benefit from the 
inclusion of varied plasticity rules, mainly in terms of speed of goal finding, 
or encoding of functionality that is unattainable in homogeneously plastic 
networks. Comparisons are made to GA, PEO and LIN networks discussed 
in Section |9j Experiments are conducted with the same parameters shown 
in Section |9j on the same environment as Section [8| 

10.1. Implementation. To facilitate the evolution of heterogeneous net- 
works the system is altered in two regards, (i) connection creation and (ii) 
GA activity. 



10.1.1. Connection Creation. On initialization of a new variable connection 
(via network creation, node addition, or connection addition), the type of 
that connection is selected probabilistically with P — 0.33 of each type (HP- 
like memristor, PEO-PANI-like memristor, variable resistor) being selected. 
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Table 4. Detailing averages and standard deviations for HP, 
PEO, LIN and GA systems on the test problem. 





HET 


PEO-PANI 


LIN 


GA 


Perf 


1.7 (4.8) 


17 (34.6) 


14.7 (32.5) 


77.6 (130.0) 


High fit 


11696 (186) 


11581 (303) 


11363 (398) 


11402 (277) 


Avg fit 


11474 (285) 


11454 (319) 


11058 (728) 


11420 (423) 


Conns(%) 


48.97 (5.58) 


49.02 (4.63) 


51.26 (4.06) 


51.19 (4.71) 


Nodes 


16.98 (0.65) 


17.04 (0.09) 


16.89 (0.54) 


17.11 (0.57) 


A* 


0.074 (0.03) 


0.123 (0.1) 


0.115 (0.1) 


0.018 (0.01) 


y, 


0.072 (0.02) 


0.062 (0.03) 


0.019 (0.02) 


0.056 (0.03) 


LO 


0.132 (0.09) 


0.113 (0.07) 


0.135 (0.11) 


0.122 (0.11) 


T 


0.011 (0.01) 


0.010 (0.01) 


0.010 (0.01) 


0.011 (0.01) 



10.1.2. GA Activity. Discovery is modified to allow one memristor type to 
mutate into another during a GA cycle. As connections are always 0.5 
before a trial begins and cannot be mutated, fi has no role in the memristive 
networks. Instead we use fi to control the rate of memristor type mutation 
taking place. During a GA cycle, after mutation, each connection in the 
child networks may alter to one of the two other connection types upon 
satisfaction of probability fi. Each network's value of /i is self-adapted as in 
equation (9), and is initially seeded randomly uniformly in the range [0,0.25] 
as with -0 and r. 

10.2. Performance. Performance is shown in Table [5j which reveals that 
heterogeneous networks have higher performance characteristics than PEO 
(p=0.026), LIN (p=0.043) and GA (p=0.003) networks. Fig. [6] (a) reveals 
that goal- finding behaviour is attained within 20 trials, faster than any ho- 
mogenous network type. The final "high fitness" value attained is higher 
than all other network types (Fig. [6] (a)), and significantly higher than that of 
both LIN (p<0.001) and GA (p<0.001) networks (Table [5]). Average fitness 
is shown in Fig. [6] (b) and can be seen to attain near-optimal population- 
wide fitness after only 300 trials, an improvement over the other network 
types considered. These results suggest that mixing synaptic behaviour al- 
lows the networks to more quickly attain higher performance characteristics. 
Averages and standard deviations are given in Table [4j 

10.3. Topology. As with the homogenous network comparisons, Table [5] 
reveals no significant differences with regards with final heterogeneous net- 
work neuron numbers. Figj6] (c) visualises a steady profile that terminates 
slightly below its starting value of 17. Percentage connectivity drops by 
approximately 1% during the experiment to a final value of 49%, shown in 
Fig. [6] (d), giving a similar final value to PEO networks, lower than LIN 
and GA. This is (just) significantly lower than LIN networks (p=0.049, Ta- 
ble]^, although the actual difference is only 2%. Due to the general lack 
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of statistical significance, it is demonstrated that the increased performance 
characteristics of heterogeneous networks are not offset by increased network 
complexity, and in some cases offer an improvement. 

10.4. Self-adaptive parameters. 

10.4.1. Mutation. Heterogeneous and GA networks use \i to control differ- 
ent aspects of the GA cycle (HET networks use it to control the rate of 
switching of memristive behaviours, GA networks use it to alter connection 
weights). Because of this, a statistically significant p- value <0.001 is seen 



between these network types (Table 
fi so comparisons are omitted. Fig. [7 
stable than the GA profile. 



6j). LIN and PEO networks do not use 
(a) shows that the HET profile is more 



10.4.2. Neuron addition/removal. The profile of xjj (the rate of construc- 
tivism events in the networks) is shown in Fig. [7] (b) to descend to a final 
value of 0.072, higher than the other network types. This is significantly 
higher than that of LIN (p=0.005) and GA (p=0.015) networks (Table [6}, 
although this seems to correspond to heightened topology manipulation ac- 
tivity rather than different final neuron levels, as shown in Table [5j All 
network types show similar downward-trending profiles for the uj parame- 
ter, which is the probability of node additon as opposed to node removal 
upon satisfaction of ^, shown in Fig. [7] (c). The similarity of these profiles 
is reflected in their respective p- values (Table [6]), which show no significant 
differences. These results indicate that the evolutionary process does not 
distinguish significantly between the variable connection types used in the 
networks. 



10.4.3. Connection addition /removal. Heterogeneous networks follow simi- 
lar profiles to LIN and PEO networks, as visualised in Fig. [7] (d). Because 
of this, there are no statistical differences in terms of r. Despite this lack 
of statistical significance in the controlling parameter, HET networks are 
significantly less connected than LIN networks (p=0.049, Table [5j This in- 
dicates that connection removal events are more likely to produce beneficial 
outcomes in HET networks than they are in LIN networks, as the frequency 
of those events is similar across the network types. 



11. Analysis of Heterogeneous Network Evolution 

Although heterogeneous networks were found to have higher performance 
characteristics than all other network types, it would be beneficial to know 
how STDP is used to benefit heterogeneous networks. For example, are 
particular variable connection types more likely to be attached to excitatory 
or inhibitory neurons? Is there an evolutionary preference to have a given 
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Table 5. Detailing performance characteristic p- values 
when comparing heterogeneous memristive networks to 
PEO,LIN and GA networks 



HETERO vs. 


Performance 


High fitness 


Neurons 


Connectivity 


PEO 


0.026 


0.098 


0.718 


0.672 


LIN 


0.043 


<0.001 


0.484 


0.049 


GA 


0.003 


<0.001 


0.288 


0.537 



Table 6. Detailing self-adaptive parameter characteristic p- 
values when comparing heterogeneous memristive networks 
to PEO, LIN and GA networks 



HETERO vs. 


M 




LO 


r 


PEO 


0.033 


0.136 


0.184 


0.093 


LIN 


0.082 


0.005 


0.839 


0.302 


GA 


<0.001 


0.015 


0.495 


0.649 



Table 7. Detailing the distribution of memristor numbers 
by type as an average of the highest-performing network in 
each run. 





P-value 


HP vs. PEO-PANI 


0.118 


HP vs. LINEAR 


0.609 


PEO vs. LINEAR 


0.054 



type of variable connection attached to certain inputs, or driving a particular 
output neuron? We focus on two broad themes; evolution (this section): 
evolutionary preferences to certain memristive configurations and runtime 
(section [l2]) - how STDP is used by those configurations to generate high- 
performance behaviour. 

We average only the best network in each run, allowing us to focus on 
topological configurations that are beneficial. Fig. [8] shows with HP and 
LIN components being preferred to PEO-PANI memristors. Despite Ta- 
ble [7] showing no statistically significant differences, it is interesting to see 
the worst-performing memristor type from the homogenous networks (HP) 
being preferred to the best-performing (PEO-PANI). This suggests that the 
evolutionary process finds a way to harness HP-like behaviour more readily 
when used in combination with other memristor types. 

11.1. Memristor Types per Layer. We now consider the specific po- 
sitions of memristors in the networks. As the networks consist of three 
layers, memristors can be classified based on the layers of the neurons that 
they connect, e.g. input, hidden, or output. Figs. [9] confirms results seen in 
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Figure 6. (a) Average fitness of highest-fitness network per 
run (b) average fitness of entire population per run (c) av- 
erage connected hidden layer nodes (d) average enabled con- 
nections in heterogeneous networks. 

Fig. [8j specifically that PEO-PANI memristors are universally more sparsely 
utilised than the other connection types. In all cases, PEO-PANI memristors 
become the minority within 200 generations. 

Two main significant results are shown in Table [8j Firstly, LIN con- 
nections are preferred to both HP (p=0.045) and PEO-PANI (p=0.012) 
types when connecting two hidden layer neurons (Fig. [9] (b)). A feasible 
explanation is that the networks benefit from a basis of stable (e.g. linear) 
communications within the hidden (processing) layer to generate reliable 
action sequences. More importantly, this result indicates that more linear 
memristors, if physically realised, could play an important role in future NC 
implementations. Secondly, HP memristors are significantly (p=0.04) pre- 
ferred to PEO-PANI memristors when connecting hidden neurons to output 
neurons; Fig. [9] (c) shows HP memristors are by far the most popular choice 
in this role. HP memristors appear to be more suited to reliably reduce the 
number of spikes in the output trains to generate low output classifications 
when a turn is required. 
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Figure 7. (a) Average self-adaptive parameters (a) fi (b) ij) 
(c) (d) r in the population for heterogeneous networks 

Table 8. Detailing the distribution of memristor numbers 
by layer as an average of the highest-performing network in 
each run. 



Memristor position 


Comparison 


P-value 


Input - hidden 


HP vs. PEO-PANI 


0.710 




HP vs. LINEAR 


0.543 




PEO vs. LINEAR 


0.339 


Hidden - hidden 


HP vs. PEO-PANI 


0.482 




HP vs. LINEAR 


0.045 




PEO vs. LINEAR 


0.012 


Hidden - Output 


HP vs. PEO-PANI 


0.04 




HP vs. LINEAR 


0.079 




PEO vs. LINEAR 


0.839 
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Figure 8. Average number of each memristor type across 
the best network per run throughout the evolutionary pro- 
cess. 



Table 9. Detailing the distribution of memristor numbers 
relative to presynaptic and postsynaptic neuron types, as an 
average the highest-performing network in each run. 



Neuron location 


Neuron type 


Comparison 


P-value 


Presynaptic 


Excitatory 


HP vs. PEO-PANI 


0.018 






HP vs. LINEAR 


0.805 






PEO vs. LINEAR 


0.015 




Inhibitory 


HP vs. PEO-PANI 


0.516 






HP vs. LINEAR 


0.259 






PEO vs. LINEAR 


0.874 


Postsynaptic 


Excitatory 


HP vs. PEO-PANI 


0.721 






HP vs. LINEAR 


0.368 






PEO vs. LINEAR 


0.314 




Inhibitory 


HP vs. PEO-PANI 


0.061 






HP vs. LINEAR 


0.72 






PEO vs. LINEAR 


0.183 



11.2. Position Analysis. Table [9] shows the relative numbers of memristor 
types that are connected (pre- or post-synaptic) to excitatory and inhibitory 
neurons. PEO-PANI memristors are less preferred to the other connection 
types; p=0.018 vs. HP memristors and p=0.015 vs. LIN memristors, when 
an excitatory neuron is presynaptic. As excitatory neurons are the sole 
method of activity generation, LIN may be preferred as they respond to 
STDP less dramatically, and can therefore more reliably maintain useful ac- 
tivity patterns. Since excitatory spikes are also responsible for generating 
output spike trains, HP are reasoned to be preferred as they can reliably 
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Figure 9. Average number of each memristor type (a) be- 
tween input and hidden layer (b) within hidden layer (c) be- 
tween hidden and output layer in the best network per run 
through the evolutionary process. 
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Table 10. Detailing the distribution of memristor numbers 
relative to input neuron function, as an average the highest- 
performing network in each run. 



Input source neuron 


Comparison 


P-value 


IR sensor 


HP vs. PEO-PANI 


0.033 




HP vs. LINEAR 


0.074 




PEO vs. LINEAR 


0.933 


Light sensor 


HP vs. PEO-PANI 


1 




HP vs. LINEAR 


0.379 




PEO vs. LINEAR 


0.410 



reduce network activity to generate low spike train classifications when re- 
quired. This supports the claim that certain memristor types are preferred 
in certain situations, and implies that the nonlinear activity of the HP and 
PEO-PANI memristors may be harnessed to alter, rather than preserve, 
network behaviour. 

Despite low general appearance rates within the networks, PEO-PANI 
memristors are significantly preferred to HP memristors when postsynaptic 
to an IR sensor (Table [l0| p=0.033). As IR sensors respond only when 
near an obstacle, swift attainment of stable high activation to alter network 
acivation is required. The PEO-PANI profile is also ideal to stabily cre- 
ate high-efficiacy connections via positive STDP which would make future 
obstacle avoidance response both stronger and quicker. In contrast, no sta- 
tistically significant values are found when the input neuron is attached to a 
light sensor. Light sensors may be more ambivalent to more gradual synap- 
tic efficiacy changes as the state space experienced by those sensors is less 
rugged than that experienced by the IR sensors; swift action perturbation 
is not required so any synapse type can function equally well. 

12. Heterogenous STDP Analysis 

Synaptic plasticity acts to alter the influence of the connections on the 
activity of the network during a trial. For this reason, we analyse the activity 
of the network as it solves the test problem, with particular attention paid 
to the role of STDP in behaviour generation. 

12.1. Runtime analysis: Averages of Best Networks. Each network 
took a differing amount of timesteps to solve the task. Since averages are 
taken over the highest performing network in each run, there is an approx- 
imate correlation between the time that network executes the "turn", and 
the time it reaches the goal state and ends the trial. Therefore, we are able 
to check for patterns within this timeframe. Initial analysis revealed that 
the numbers of STDP events during a trial tend to osciallate between two or 
more values as the trial progresses. To account for erronuous statistics that 
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Figure 10. Average (a) weight (b) positive STDP events (c) 
negative STDP events through activation, averaged over the 
best network from each run (d) Topology of the single best- 
performing network. In the hidden layer, darker-coloured 
neurons are inhibitory and lighter-coloured neurons are exci- 
tatory. 



may arise as a result of these oscillations, all STDP values (and resultant 
weights) are averages over the previous 10 timesteps. 

12.1.1. Memristor weights. Fig. [To] (a) shows the average weight per mem- 
ristor type during runtime. The PEO-PANI weight profile constantly in- 
creased throughout the duration of the trial, whereas the HP memristor 
and linear resistor weights terminated at approximately equal values. PEO- 
PANI memristors were universally higher- weighted than HP memristors and 
linear resistors at the end of a trial; average final weights were HP memristor 
= 0.572, PEO-PANI memristor= 0.673 and linear resistor = 0.591. Between 
HP and PEO-PANI memristors a statistically signifant p- value of 0.047 is 
observed, suggesting that PEO-PANI memristors act more like facilitating 
synapses than the other two memristor types. 
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12.1.2. STDP. STDP events were most prevalent during the first 250 timesteps 
(Figs. [To] (b) and [lO| (c)). Within this timeframe, HP memristors had low 
amounts of positive STDP (36.18) and high amounts of negative STDP 
(67.24). PEO-PANI memristors had high (84.45) positive STDP events, 
and low (16.48) negative STDP events. Linear resistors possessed compa- 
rable amounts of both types of STDP; 31.23 positive and 25.04 negative 
events. Because of this, HP memristors had significantly more negative 
STDP events than PEO-PANI did, and PEO-PANI underwent significantly 
more positive STDP events than HP (both p<0.001). HP memristors ex- 
perience significantly more negative STDP than negative STDP, with the 
inverse being true for PEO-PANI memristors (both p<0.001). These results 
reinforce the view that the different memristor types are harnessed by the 
network as a result of being placed in favourable locations via evolution. 

Following this period of heightened STDP activity, STDP events for all 
three variable connection types diminish and become more stable; networks 
use STDP to "set up" connection weights, which are then affected by further 
STDP to induce turning behaviour. It should be noted that STDP generally 
involves more positive events than it does negative, e.g. the role of STDP 
is mainly to increase levels of activation within the networks. 



12.2. Turn Analysis of Best Overall Network. We further refined the 
scope of our investigation to cover the single highest performing network, 
shown in Fig. [To] (d), which solved the task in 709 steps. Motivation for 
focus on the turn is based on activity; as more STDP events occured during 
a turn this was an obvious timeframe to study. 

The network existed in a number of stable states oscillating between (usu- 
ally 2) STDP values, which were observed throughout runtime. Turning 
motion began at timestep 293 and ended at timestep 372, during which 
periodic action switching behaviour between "forward" and "right turn" ac- 
tions were observed. Outside of this range, uniform "forward" actions were 
generated. Performance characteristics of each memristor type at the turn 



event are shown in Table 11 HP memristors had lower rates of positive 
STDP with respect to negative STDP throughout the turn. The opposite 
was true for PEO-PANI memristors, strengthening the notion that the two 
memristor types are evolutionarily preferred as facilitating and depressing 
synapses respectively. During the turn, HP memristors were seen to main- 
tain identical numbers of positive STDP events, whereas both PEO-PANI 
memristors and linear resistors kept identical negative STDP oscillations. 
In particular, two memristors were altered via STDP during runtime to 



achieve the desired behaviour, shown in Fig. 11, Firstly, the HP memris 



tor connecting the second hidden node to the first output node underwent 
repeated negative STDP events, which due to the HP memristance profile 
enacted a swift decrease in conductivity. This caused the initial turning 
motion by altering the spike train of the first output neuron from "high ac- 
tivation" to "low activation". Correspondingly, the output action changed 
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Table 11. Detailing the STDP activity of memristor types 
during and after the turn event. 





Memristor Type 






Start of turn 


HP 


PEO-PANI 


LINEAR 


Positive STDP 


11 to 11/12 


26/27 to 17/19 


19/21 to 27 


Negative STDP 


25/26 to 32/33 


6/9 tol4/15 


43 to 25/29 










After turn 








Positive STDP 


11/12 to 13/17 


37/41 to 58/63 


37/38 to 59/60 


Negative STDP 


25/26 to 18/22 


14/15 to 2/5 


25/29 to 22/23 



from constant forward motion to sequential "forward" and "right turn" ac- 
tions. Towards the end of the turn, this connection underwent a restrength- 
ening due to different input node spike trains, allowing the first output neu- 
ron to achieve higher spiking frequency and produce a constant "forward" 
motion. The second memristor in question was postsynaptic to the first 
input node and presynaptic to the 8th hidden node, which was strength- 
ened at the end of the turn. It was reasoned that this memristor allowed 
forwards motion to be generated by compensating for the change in light 
sensor values owing to the orientation of the agent changing. Practically, 
the newly-strengthened memristor caused the 8th hidden neuron to spike 
more frequently. As this neuron was connected to the first output neuron, 
increased activity also caused the output neuron to spike more frequently, 
causing a "high" spike train classification which, in cooperation with the 
activity of the first memristor mentioned above, allowed for the generation 
of stable "forward" motion despite the new agent orientation. 

13. Dynamic reward scenario 

To further test the capabilities of the HET system, we ran an experiment 
based on the T-maze (e.g. [67] ) scenario, where the agent must "forget" it's 
previously-learned behaviour after a time and adapt to a newly-positioned 
goal state. Soltoggio |68 demonstrates the utility of plastic networks in such 
dynamic reward scenarios. 

For continuity, the sensorimotor space was identical to the previous ex- 
periment (Fig. 3), although the same adaptivity as in the T-maze is required. 
We made the environment more challenging. Firstly sensory noise was added 
based on Webots Khepera data; ±2% noise for IR sensors and ±10% noise 
for light sensors, all randomly sampled from a uniform distribution. Wheel 
slippage was also included (10% chance). Secondly, the location of the re- 
ward changed from upper-right to upper-left during the lifetime of the agent 
(Fig. 3(b)). It should be noted that the light source does not move, e.g. with 
the reward in its second position, the agent is no longer performing photo- 
taxis. 
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(A) (B) (C) 

Figure 1 1 . Showing the use of plasticity in the single best- 
performing network during activation, (a) pre-turn (b) dur- 
ing turn (c) after turn. Darker coloured neurons are in- 
hibitory and lighter coloured neurons are excitatory. 



Each trial was split into two parts, the reward was moved for part 2. 
Membrane potentials and synaptic weights were not reset between these 
parts so that the agent had memory of the first part. If the agent did not 
locate the goal in the first part, it cannot receive reward when the goal is 
moved. A reward of 1 was given when the agent stabily found the first 
reward zone (part 1). After this, the reward was relocated to the upper- 
left of the environment and part 2 commenced, continuing until the agent 
located the new reward zone (for a total fitness of 2), or the step limit was 
reached. "Performance" was the number of trials the system took to find 
the second reward having located the first reward and was the main metric 
for comparison as it measured adaptation speed. All other parameters are 
identical to those in Section [9l 

In the following experiment, we compare the HET system to a benchmark 
GA system, and intend to demonstrate the utilty of memristive networks 
over those with static connections in this dynamic reward scenario. Results, 
shown in Table 12, reveal that HET networks are universally preferable to 
GA networks, having higher performance, high fitness and average fitness, as 
well as lower connectivity and neuron numbers. Significantly, HET networks 
are quicker at adapting to the change in reward location (p=0.037), suggest- 
ing that plastic memristive networks are suited to dynamic tasks. Six of the 
GA networks could not locate both rewards. The r parameter was signifi- 
cantly lower (p=0.049), although this did not lead to a significant reduction 
in connectivity. All other parameters (/i, ^, uS) have different final values 
than those in the first experiment, demonstrating the context-sensitivity of 
the self-adaptation process. 
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Figure 12. Average (a) weight (b) positive STDP events 
(c) negative STDP events through activation, averaged over 
the best network from each run in the dynamic scenario (d) 
highest-performing network topology. 



13.1. Evolution Analysis. Whereas the previous experiment saw HP be- 
ing preferred to PEO-PANI when connecting hidden-output neurons, evo- 
lution now prefers both HP (p=0.007, avg 3.8) and PEO-PANI (p=0.047, 
avg 4.4) to LIN (avg 2.4) components. This, coupled with the fact that LIN 
synapses are no longer significantly preferred to the other types between two 
hidden layer neurons, suggests that dynamic (nonlinear) activity is required 
by the networks to adapt rapidly. 

HP memristors are also significantly preferred to PEO-PANI memristors 
(p=0.032) when connected to a light sensor, averages are 6 and 4.8 synapses 
of that respective type. It is postulated that this favouring of an easily 
depressed connection type is one of the ways the networks evolve to deal 
with noise, which light sensors experience more than IR sensors. 



13.2. STDP Analysis. Average synapse weights and STDP performance 
can be seen in Fig. 12 STDP is originally used as in the previous experiment, 
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Table 12. Showing averages (standard deviations) and p- 
values of HET and GA systems on the dynamic reward sce- 
nario. 





HET 


GA 


p- value 


Performance 


57.8 (52.5) 


541.4 (364) 


0.037 


High fit 


2(0) 


1.8 (0.45) 


0.373 


Avg fit 


1.12 (0.08) 


1.03 (0.04) 


0.155 


Conns(%) 


52.21 (1.61) 


52.78 (1.29) 


0.109 


Nodes 


16.79 (0.32) 


16.85 (0.12) 


0.739 


M 


0.06 (0.01) 


0.08 (0.02) 


0.111 




0.08 (0.02) 


0.09 (0.02) 


0.192 


CO 


0.25 (0.09) 


0.30 (0.05) 


0.422 


T 


0.05 (0.01) 


0.06 (0.01) 


0.049 



to "set up" connection weights. The major result from STDP analysis of 
the best network from each run is that the average weight per synapse type 
varies between the two parts of the trial (Fig. [l2|(a)). In contrast to the 
previous experiment, PEO-PANI weights are low during the first part, and 
suddenly increase at the start of the second part (approximately timestep 
750). In all networks considered, PEO-PANI synapse weight varied more 
widely than the other types during a trial. It is reasoned that PEO-PANI 
synapses were used as they can affect network activity more stabily, and 
more strongly per positive STDP event. Average PEO-PANI weight (0.539) 
was significantly higher than average HP weight (0.507, p<0.001). 

The use of STDP within the networks is shown in Figs. [l2^b) and(c). 
STDP is intially similar to the previous experiment, with a slight increase 
in all types of positive STDP at timestep 750. This coincides with a de- 
crease in HP negative STDP, and increase in PEO and LIN negative STDP 
around the same timeframe. Due to the nature of the PEO-PANI profile, 
this slight increase in positive STDP corresponds to the large increase in 
average weight seen in Fig. [l2^a), allowing the network to successfully solve 
the environment. Overall, HP memristors undergo significantly more nega- 
tive STDP than positive STDP, with the inverse being true of PEO-PANI 
memristors. Noise is seen to be handled in some networks by STDP. Specifi- 
cally, light sensors are frequently seen attached to inhibitory neurons, which 
can act via STDP to reduce the impact of those inputs on the activity of 
the network. To give some idea of topology, the best network is shown in 
Fig. [l2^d). It should be noted that, despite the increased complexity of 
the task, this network contains less hidden layer neurons (17 vs. 20) and 
memristors (91 vs. 109) than the best network from the static environment. 
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14. Summary 

In this paper we have demonstrated the first evolutionary approach to de- 
signing memristive SNNs for obstacle avoidance/dynamic reward tasks. We 
have shown that plasticity can be harnessed by the networks via STDP to 
achieve more expedient goal-finding behaviour with no significant downside 
in terms of topological complexity. Results indicate that, in possible NC 
implementations, heterogeneous mixtures of memristors possess advantages 
compared to both constant connections and networks of a single memristor 
type. Self-adaptive parameters were found to alter dependent on the vari- 
able connection type in the network. It is important to note that internal 
memristive network dynamics have no analogue in the GA case; those be- 
haviours cannot be replicated by GA networks. Overall it can be seen that 
all of our research questions have been answered and the hypotheses suffi- 
ciently demonstrated, as highlighted in the conclusions drawn from each set 
of experiments and summarised below. 

The original hypothesis, "that memristive synapses provide the networks 
with increased performance" , was confirmed as PEO and LIN networks out- 
performed GA networks, and PEO networks evolved higher fitness solutions 
than GA networks. 

The heterogeneous network hypothesis, "to confirm that varied memris- 
tive behaviours can be harnessed by the evolutionary design process to pro- 
vide further advantages, specifically that (i) certain functionality can be more 
easily achieved by certain memristor types (ii) combinations of memristor 
types are beneficial to the networks" was answered as heterogenous networks 
were higher performing than LIN, PEO and GA networks, and generated 
higher fitness solutions than LIN and GA networks. They were also shown 
to outperform GA networks in a dynamic reward scenario. 

Research question 1 - "Does the evolutionary process allow for the suc- 
cessful generation of memristive networks that outperform constant-valued 
connections, despite the memristors nonlinearity and given the potential for 
complex interactions within memristive networks?" - was answered as PEO 
networks were successfully evolved to outperform, and generate higher fit- 
ness solutions than, GA networks. 

Reseach question 2 - "In the heterogeneous case, do mixtures of mem- 
ristors provide better performance than other implementations? How do 
such networks generate useful behaviour?" - was proven as heterogeneous 
networks had higher performance characteristics than PEO, LIN and GA 
networks. Useful behaviour was generated on an evolutionary level by as- 
signing positions to the memristors based on their profiles, and on a runtime 
level by generating STDP to alter synaptic efflciacies to exploit properties 
of those profiles. 

Research question 3 - "Is there an evolutionary preference in assigning 
specific roles to specific memristor types based on variations in their mem- 
ristive behaviours?" was answered as a number of statistically significant 
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differences with respect to the placement of specific connection types in 
certain positions in the networks were found. 

Biological brains contain mixtures of synapses that have specific types 
(e.g. depressing, facilitating) based on their performance characteristics [69] ; 
results suggest that the evolutionary process casts the HP memristor as a 
depressing synapse and the PEO-PANI as an excitatory synapse within the 
heterogenous networks. With statistical significance, in both static and 
dynamic scenarios, PEO-PANI synapses achieved higher average efficiency 
than HP synapses and underwent more positive STDP than negative STDP, 
as well as undergoing more positive STDP than HP memristors did. The in- 
verse is true of the HP memristor when compared to the PEO-PANI. Biolog- 
ical brains also place these varied synapse types in certain contexts (e.g. [70] 
gives examples of depressing synapses being typically found between two 
pyramidal neurons and facilitating synapses being frequently connecting 
between pyramidal and interneurons) . Initial findings provide compelling 
evidence that evolution of heterogeneous networks shares this feature; nu- 
merous examples have been reported herein including (i) PEO-PANI being 
preferred to HP when attached to IR sensors (ii) LIN being preferred to HP 
and PEO-PANI, being used to generate more stable behaviour between two 
hidden layer neurons (iii) HP being preferred to PEO-PANI when connect- 
ing to output layer neurons. In the dynamic case, (i) HP were preferred to 
PEO-PANI when attached to light sensors (ii) PEO-PANI were preferred 
to HP and LIN when connecting to output neurons. It is clear that the 
memristors are assigned types, and thus roles, within the networks based 
on their profiles. It is also shown that the role for a specific type can vary 
based on the environment the controller encounters. 

The introduction highlighted the implementation of neuromorphic struc- 
tures as motivation for conducting this research. The use of physical equa- 
tions to model memristive behaviour makes a future hardware implementa- 
tion more viable. Performance of the yet-to-be-realised LIN component indi- 
cates that more linear memristive behaviours may be beneficial, especially in 
a heterogeneous scenario. As the memristors have a constant initial weight 
that is not affected by GA activity, memristive networks are initially handi- 
capped with less degrees of behavioural freedom. Despite this fact, they are 
shown to adapt by allowing environmental signals to alter synaptic efficiency 
to outperform (in some cases) the GA approach on the test problem. The 
networks evolved are admittedly at a much smaller scale than those required 
by the neuromorphic paradigm. Scalability is more likely to be possible due 
to the inclusion of constructivism and self-adaptive search parameters, pro- 
vided that the innate self-organising properties of the networks can account 
for the increased complexity of intra-network communications. 
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